{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon QuickSight\n",
    "\n",
    "Amazon QuickSight is a very fast, easy-to-use, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data, anytime, on any device. \n",
    "\n",
    "We will use Amazon QuickSight to visualize some characteristics of our Amazon Customer Reviews  dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Amazon QuickSight Account"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you already have an Amazon QuickSight account, you can access QuickSight at https://quicksight.aws.amazon.com.\n",
    "\n",
    "If you don't have an account yet, follow these steps:\n",
    "\n",
    "* In the AWS console, click on Services and then on QuickSight. Click on `Sign up for QuickSight`.\n",
    "\n",
    "* Select Standard and then click on Continue\n",
    "\n",
    "* Provide a name for the QuickSight account, enter an email address, and click Finish. \n",
    "\n",
    "* Click on `Go` to Amazon QuickSight and dismiss the tutorial.\n",
    "\n",
    "Once you have an account, you can add a dataset based on Athena, but first you need to allow the QuickSight service to access Amazon Athena and the underlying S3 resources. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Allow QuickSight To Access Athena and S3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Click on the `Admin` button on the top right corner and then on `Manage QuickSight`. \n",
    "\n",
    "* Click on `Account setting`s and then on `Manage QuickSight permissions`. \n",
    "\n",
    "* Select `Amazon Athena` and click on `Choose S3 buckets`, then click on `S3 buckets you can access across AWS` and provide the name `amazon-reviews-pds` under the Use a different bucket option. \n",
    "\n",
    "* Click on `Add S3 bucket` and then on `Select buckets`. \n",
    "\n",
    "* Click again on `Choose S3 buckets` and select your default S3 bucket to store results. Click `Select buckets`. \n",
    "\n",
    "* Click on `Apply` and then on the top left corner on the QuickSight logo. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup Amazon Athena"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order for us to use Amazon Athena to query our Amazon Customer Reviews Dataset, we first need to set up the service as follows: \n",
    "\n",
    "* Open the Amazon Athena console. If this is your first time visiting the Athena console, you’ll see a `Getting Started`. Choose `Get Started` to open the Query Editor.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Athena](img/chapter4-03.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* You might also get asked to set up a query result location in Amazon S3 if you are using Amazon Athena for the first time, click on the link shown below and define a S3 bucket, e.g. data-science-on-aws-\\<your-name\\>/athena/\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Athena](img/chapter4-02.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Next, create a database. In the Athena Query Editor, you see a query pane with an example query. Start typing your query anywhere in the query pane as shown below.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Athena](img/chapter4-04.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* To create our database, enter the following `CREATE DATABASE` statement, and then choose `Run Query`:\n",
    "\n",
    "`CREATE DATABASE dsoaws;`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Confirm that the catalog display refreshes and `dsoaws` appears in the `DATABASE` list in the Catalog dashboard on the left side."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Athena](img/chapter4-05.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Create a table. Now that we have a database, we’re ready to create a table that’s based on the Amazon Customer Reviews Dataset. We define the columns that map to the data, specify how the data is delimited, and provide the location in Amazon S3 for the file(s). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Make sure that `dsoaws` is selected for `DATABAS`E and then choose `New Query` and run the following SQL statement:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`CREATE EXTERNAL TABLE IF NOT EXISTS amazon_reviews_parquet(\n",
    "         marketplace string,\n",
    "         customer_id string,\n",
    "         review_id string,\n",
    "         product_id string,\n",
    "         product_parent string,\n",
    "         product_title string,\n",
    "         star_rating int,\n",
    "         helpful_votes int,\n",
    "         total_votes int,\n",
    "         vine string,\n",
    "         verified_purchase string,\n",
    "         review_headline string,\n",
    "         review_body string,\n",
    "         review_date bigint,\n",
    "         year int \n",
    ") PARTITIONED BY (\n",
    "         product_category string \n",
    ") \n",
    "ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amazon-reviews-pds/parquet/'`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* You should see the newly created table amazon_reviews_parquet appear on the left under `Tables`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Athena](img/chapter4-06.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* As a last step, we need to load the Parquet partitions. To do so, just issue the following SQL command: \n",
    "`MSCK REPAIR TABLE amazon_reviews_parquet;`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Done! In just 3 steps we have set up Amazon Athena to connect to our Amazon Customer Reviews Dataset, and are now ready to visualize the dataset with Amazon QuickSight."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing Dataset With Amazon QuickSight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* To add a dataset, click on `Manage data` on the upper right corner.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Manage Data](img/chapter4-07.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Click on `New data set` on the upper left."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Create New Data Set](img/chapter4-08.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Select `Athena`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Select Athena Type](img/chapter4-09.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Provide a `Data source name`, for example `amazon_reviews_parquet` and click on `Create data source`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![New Athena Data Source](img/chapter4-10.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Select the previously generated `dsoaws` database from the drop-down list, and our `amazon_reviews_paquet` table. Click on `Select`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Query Data Source](img/chapter4-11.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Choose `Directly query your data` and click on `Visualize`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Visualize Data](img/chapter4-12.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* You should now see an empty graph and our dataset columns listed on the left."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Visualize Data by Product Category](img/chapter4-13.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now let's start visualizing:\n",
    "\n",
    "* Select a `Horizontal chart bar` graph under `Visual types` in the lower left corner.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![SelectBarChart](img/chapter4-14.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Drag the `product_category` field to the y-axis and value boxes. The visualization should now look like this:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![FinalChart](img/chapter4-15.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use QuickSight to Analyze Data with Redshift"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You need to open the Quicksight CIDR range in the Redshift security group.\n",
    "\n",
    "https://aws.amazon.com/premiumsupport/knowledge-center/quicksight-redshift-private-connection/\n",
    "\n",
    "See for Quicksight CIDR ranges across regions here:\n",
    "\n",
    "https://docs.aws.amazon.com/quicksight/latest/user/regions.html\n",
    "\n",
    "![](img/quicksight_redshift_sg.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
